Frame-Wise Cross-Modal Matching for Video Moment Retrieval
نویسندگان
چکیده
Video moment retrieval targets at retrieving a golden in video for given natural language query. The main challenges of this task include 1) the requirement accurately localizing (i.e., start time and end of) relevant an untrimmed stream, 2) bridging semantic gap between textual query contents. To tackle those problems, early approaches adopt sliding window or uniform sampling to collect clips first then match each clip with identify clips. Obviously, these strategies are time-consuming often lead unsatisfied accuracy localization due unpredictable length moment. avoid limitations, researchers recently attempt directly predict boundaries without generate first. One mainstream approach is multimodal feature vector target frames (e.g., concatenation) use regression upon boundary detection. Although some progress has been achieved by approach, we argue that methods have not well captured cross-modal interactions frames. In paper, propose Attentive Cross-modal Relevance Matching (ACRM) model which predicts temporal based on interaction modeling two modalities. addition, attention module introduced automatically assign higher weights words richer cues, considered be more important finding Another contribution additional predictor utilize internal training improve accuracy. Extensive experiments public datasets TACoS Charades-STA demonstrate superiority our method over several state-of-the-art methods. Ablation studies also conducted examine effectiveness different modules ACRM model.
منابع مشابه
Cross-modal Embeddings for Video and Audio Retrieval
The increasing amount of online videos brings several opportunities for training self-supervised neural networks. The creation of large scale datasets of videos such as the YouTube8M allows us to deal with this large amount of data in manageable way. In this work, we find new ways of exploiting this dataset by taking advantage of the multi-modal information it provides. By means of a neural net...
متن کاملCross-Modal Manifold Learning for Cross-modal Retrieval
This paper presents a new scalable algorithm for cross-modal similarity preserving retrieval in a learnt manifold space. Unlike existing approaches that compromise between preserving global and local geometries, the proposed technique respects both simultaneously during manifold alignment. The global topologies are maintained by recovering underlying mapping functions in the joint manifold spac...
متن کاملMHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval
Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data (such as text, image, video, audio and 3D model). However, existing methods based on deep neural network (DNN) often face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is usually adopted for relievin...
متن کاملAn Efficient Adaptive Boundary Matching Algorithm for Video Error Concealment
Sending compressed video data in error-prone environments (like the Internet and wireless networks) might cause data degradation. Error concealment techniques try to conceal the received data in the decoder side. In this paper, an adaptive boundary matching algorithm is presented for recovering the damaged motion vectors (MVs). This algorithm uses an outer boundary matching or directional tempo...
متن کاملOPTIMAL DESIGN OF COLUMNS FOR AN INTERMEDIATE MOMENT FRAME UNDER UNIAXIAL MOMENT AND AXIAL LOADS
The present study addresses optimal design of reinforced concrete (RC) columns based on equivalent equations considering deformability regulations of ACI318-14 under axial force and uniaxial bending moment. This study contrary to common approaches working with trial and error approach in design, at first presents an exact solution for intensity of longitudinal reinforcement in column section by...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2022
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2021.3063631